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We present the design and evaluation of a quantum carry-lookahead adder (QCLA) us- 
' ing measurement-based quantum computation (MBQC), called MBQCLA. QCLA was 

C _ ) originally designed for an abstract, concurrent architecture supporting long-distance 

communication, but most realistic architectures heavily constrain communication dis- 
tances. The quantum carry-lookahead adder is faster than a quantum ripple-carry adder; 
QCLA has logarithmic depth while ripple adders have linear depth. MBQCLA utilizes 
MBQC's ability to transfer quantum states in unit time to accelerate addition. MBQ- 
CLA breaks the latency limit of addition circuits in nearest neighbor-only architectures : 
compared to the ©(n) limit on circuit depth for linear nearest-neighbor architectures, it 
can reach @(log n) depth. MBQCLA is an order of magnitude faster than a ripple-carry 
' adder when adding registers longer than 100 qubits, but requires a cluster state that is 

?H , an order of magnitude larger. The cluster state resources can be classified as computa- 

tion and communication; for the unoptimized form, fs 88 % of the resources are used 
for communication. Hand optimization of horizontal communication costs results in a ~ 
12% reduction in spatial resources for the in-place MBQCLA circuit. For comparison, a 
graph state quantum carry-lookahead adder (GSQCLA) uses only « 9 % of the spatial 
resources of the MBQCLA. 

Keywords: Keywordl; keyword2; keyword3. 



1. Introduction 

Measurement-based quantum computation (MBQC) is a newparadigm for imple- 
menting quantum algorithms using a quantum cluster state LULiMILil A cluster state 
is a highly entangled state of qubits which can serve as the resource for universal 
quantum computation. By subsequent singlc-qubit measurements, quantum gates 
are effected on the logical qubits encoded in the cluster state. Quantum information 
propagation in a cluster is driven by the pattern of measurement bases, regardless of 
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the measurement outcomes. MBQC is attractive because cluster states are consid- 
ered to be easy to create on systems ranging from the polarization state of photons 
El to Josephson junction qubits ^1 

A cluster state can be built on a two-dimensional rectangular lattice with Man- 
hattan geometry. Some of the qubits in the cluster state are data qubits, while 
the rest are created in a generic entangled state. Employing quantum correlations 
for quantum computation, quantum gates on the data qubits can be evaluated by 
measuring lattice qubits in a particular basis. All gates in the Clifford group, in- 
cluding CNOT, can be performed in one time step via a large number of concurrent 
measurements. Remarkably, because both wires and SWAP gates are in the Clif- 
ford group, MBQC supports long-distance gates in a single time step even when 
the cluster state is built on a physical system permitting only nearest-neighbor 
interactions. El 

The Toffoli Phase gate, which is not in the Clifford group, can be executed in two 
time steps, where the measurement basis for the second step is selected depending 
on previous measurement outcomes. This adaptive process, which must be cascaded 
through most interesting quantum circuits, determines the overall performance of 
many algorithms. 

Thus, a cluster state can be used to execute arbitrary quantum algorithms. 
MBQC algorithms are often created by mapping known quantum circuits onto 
the cluster state. The challenge is to find application algorithms that match the 
strengths of MBQC. Here, we choose to address the problem of integer addition. 

Addition is a critical subroutine for algorithms such as Shor's algorithm for 
factoring large numbers El EE! Ell Addition can be executed in many ways, with 
its performance being primarily de penden t on carry propagation, which is normally 
limited by the physical architectur e! 1 1 1 The simplest method is ripple-carry addi- 
tion, which has depth of 0(n}P^E2Q3]to add two n-bit numbers. In a ripple-carry 
adder, carry information is propagated from the low-order qubits to the high order 
qubits one step at a time. 

The goal of our work is to reduce the execution time of addition on MBQC. 
Raussendorf et at successfully mapped the VBE ripple-carry adder to MBQC bend- 
ing the circuit layout to reduce the spatial resource J^IE However, a ripple-carry 
adder does not take good advantage of the strengths of MBQC. By unifying the 
quantum carry-lookahead adder 

(QCLAjEl 

with MBQC, we have designed a much 
faster circuit for large n. In this paper, we present our design for the MBQCLA 
and evaluate the design in terms of its execution speed and resource requirements. 
The depth and spatial optimizations are also discussed. 

The paper is organized as follows: the basic notions of measurement-based quan- 



a In this paper, we focus on the quantum rather than classical aspects of the system; a Pauli frame 
correction based on measurement results may be necessary and will be limited by classical signal 
propagation time. Thus, single time step wires depend on the assumption that classical signal 
propagation is fast compared to quantum measurements and gates. 
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turn computation and quantum carry- lookahead adder are given in Section 2. Sec- 
tion 3 contains the implementation of MBQC form for QCLA circuits. Here, the 
out-of-place, in-place, and the optimized version of in-place circuits are discussed to 
obtain their performances and requirements. We conclude in Section 4. Appendix 
A gives a detailed exposition of a NOT gate in cluster state and the graphical no- 
tation used in this paper. Appendix B contains the procedures to implement the 
out-of-place and in-place QCLAs in abstract quantum circuit form. Appendix C 
provides the requirements and performance for MBQCLA circuits. 

2. Background 

Our proposed circuits build on two concepts: (a) measurement-based quantum 
computation, and (b) the quantum carry-lookahead adder. In this section, we will 
present a short review of these concepts. 

2.1. Measurement- Based Quantum Computation 

A one-dimensional cluster state is in the form of 

i^) = i(8)(i°) Q ^ a+1) + i 1 ) a )' f 1 ) 

a— 1 

where <j„« is the Pauli operator operating on qubit i in the 7V-qubit cluster C, a is 
the index of a qubit in cluster C and v can be x, y, or z depending on the choice of 
interaction Hamiltonian between neighbors ^ and with the convention a(f +1 = 1. 
In general, the cluster state should obey the quantum correlation equation 

® ^ 8) I*W) C = (-!)*■ I*«)c > ( 2 ) 

a,v 

where v—I, x, y, z, and k a — {0, 1}. The parameter {k} is a set of index parameters 
specifying the cluster state. |${jfc}) c expresses the cluster state before the measure- 
ment and |*{fc}) C ( s ) represents the cluster state after a set of measurements in 
which quantum gate g has been simulated on the cluster state. 

The cluster state can be created in several ways, e.g., initializing every qubit 
to the |+) state and performing a Controlled-Z gate between each neighboring 
pair. With such a cluster state, Raussendorf et al. showed that a carefully chosen 
measurement pattern can effect any quantum gate on logical qubits. 

Suppose we have an initial set of cluster state eigenvalue equations, |3>{fc}) c , 
representing the cluster which is the union of the input cluster (Ci npu t), the ma- 
chine cluster, (Cmachine) and the output cluster (C ou t pu t)- All of the qubits except 
the output cluster are measured by the projective measurement operators V with 
certain measurement patterns M, V '{ S }^ C H-M)=(£) keC ^ rk ' a — . The new m- 
qubits output register of the quantum logic network is the cluster state |^{fc}) C (_) 
obeying 2m new eigenvalue equations: 
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a {C input{g ), i){U(T (S rfyc^m = ( _ 1} A I(1 (3) 

a (C inP u t ( a )i) (Ua (i) ^(CWd) |^ )cw = ( _ 1} A,, |^ )cw ( (4) 

where A Xj j, A Z; i € {0, 1} are the measurement outcomes. 

Quantum computation in Clifford algebra form implicitly appears in the final set of 
eigenvalue equations after the measurements. A brief review is given in Appendix 
A. 

Several remarkable properties follow HHHE 

• Measurement of qubits in the machine cluster in the aveigenbasis removes them 
from the main cluster and disconnects all of the their bonds. 

• Measurement of qubits in the machine cluster in the a x -eigenbasis removes them 
from the main cluster and creates Bell pairs between the qubits in input cluster 
and the qubits in the output cluster. 

• Measurement of qubits in the machine cluster in the Cy-eigenbasis removes them 
from the main cluster and leaves an entangled state between the qubits in the 
input cluster and the qubits in the output cluster. 

The measur ement calculus is a convenient formalism for representing MBQC 
quantum gates Danos et al. show how to write an MBQC quantum gate 

U in the form £Y:=({Resources}^, {Input}^, {Output}^, {EMC}u). Based on 
this definition, we introduce the notation U <n> meaning a quantum gate in 
MBQC using n qubits. CMOT <4> refers to CMOT <4> := ({1, 2, 3, 4}, {1, 
4}, {3, 4}, {Xl 3 Zl 2 Z° 2 MgM$E 13 E 2 3E 34 }). A fifteen-qubit form of the gates 
is :CA/'OT <lj> = ({Resources}c7v C i7-<i5> , {Input}c^/ C )7-<i5> , {Output} CA/ - C) 7-<i5> , 
{EMC} cMO t<^>) ■■= ({1, - , 15}, {1, 9}, {7, 15}, {MC} WOT < 15 >) as 
mentioned in Ref. H There are two types of Toffoli gate: CCAfOT <54> El an d 
CCAfOT <39> IS. Both Toffoli gates have similar numbers of adaptive measure- 
ments, but different numbers of qubit resources. CCMOT <39> must be connected 
into an arbitrary graph, while CCAfOT <5i> is appropriate for the Manhattan ge- 
ometry cluster state. 

The physical implementation of MBQC requires a lattice system with an Ising- 
like interaction between the qubits so that the quantum information can be prop- 
agated in the lattice due to the measurement. Several physical implementations 
have been proposed; Meier et al. proposed the possibility of experimental realiza- 
tion to perform initialization, quantum gate operation and read-out mechanism in 
antiferromagnetic spin cluster quantum computing-^. Devitt et al. have described 
an all-optical implementation where the required number of photonic modules and 
chips only depends on the cross section length of the two-dimensional lattice (cor- 
responding to the y-axis in our figures j20IE] 
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MBQC runs in two phases: prepare the cluster state, then measure. Because 
the preparation step is completely generic, failure in coupling the qubits is not a 
problem. Mechanisms that succeed only probabilistically ca n b e used, as long as 
failures are heralded, making optical QC suitable for MBQc!22l 

2.2. Quantum Carry-Lookahead Adder 

The Quantum Carry-Lookahead Adder (QCLA) was designed by Draper et aiff^l. 
The quantum carry-lookahead adder is potentially more efficient than a quantum 
ripple-carry adder since its depth is 0(log (n)). A carry-lookahead adder uses three 
phases, the "Generate" (G), "Propagate" (P), and "Kill" (K) networks, each of 
which progressively doubles the length of its span in each time step, to calculate 
the complete "Carry" values (C). In practice, the networks are somewhat redun- 
dant, and Draper et al. defined their circuit using only the P and G networks 
to calculate the final carry C. The out-of-place form of the QCLA performs the 
unitary transformation \ a, b, 0) — > \a,b, a + b), and the in-place form calculates 
| a, 6) — ► |a, a + 6) where |a) , \ b) and \a + b) are n-qubit registers, where is the 
low-order qubit and n-1 is the high-order qubit. 

The carry-lookahead adder starts with an initial addition round, consisting of a 
half adder for each qubit in the logical register. Starting from the basic idea of the 

IOQI 

carry-lookahead adder originally designed for classical binary logic , the carry is 
propagated from bit to bit z— >j— >/c, where i<j<k, so carry equations or majority 
blocks are represented by: 

Cj = 9[hj] ®P[i>j] A c i ( 5 ) 

Cfc = g\j, k] ffi p\j, k] A Cj (6) 
By straightforward substitution of these equations, we have the equation: 

ck = g[j, k] © p\j, k] a ( g [i,j] e p[i,j] a a) (7) 

or 

Cfc = g[j, k] p\j, k] A g[i,j] © p[i,j] A p\j, k] A a (8) 
Substituting by Cfc=g[i, k](B p[i, fc]A c^ into Eq.® gives: 

g[i,k] = g\j,k] ®p\j,k] Ag[i,j]. (9) 

A circuit that performs this computation in a lookahead adder is called the Generate 
network. Similarly, a circuit that implements the equations 

p[i,k]=p[i,j]Ap\j,k] (10) 

for any i<j<k is called the Propagate network. The implementation of these net- 
works in reversible computation is realized by the following steps where n is logical 
qubits, t is the round number and m is the index of qubits in the register: 
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(1) P-rounds. For t=l to [log nj - 1: for l<m< |_n /2*J . Then the connection between 
the steps in this round is expressed by: P t [m]©=P t _i[2m]P t _i[2m+l]. 

(2) G-rounds. For t—1 to [log n\ : for 0<m< [_7T. / 2*J . The relation between the steps 
in the round is: G[2 t m+2 i ]©=G[2 t m+2 i - 1 ]P t _ 1 [2m+l]. 

(3) C-rounds. For t=[\og 2n/3j down to 1: for \<m<\(n- 
2*~ 1 )/2 t J. The connection between the steps in the round is represented by: 
G[2*m+2 t ~ 1 ]©=G[2 t TO]P i _i[2m]. 

These networks will be applied both to out-of-place and in-place QCLA circuits. 
Those circuits, which are distinguished by the form of the addition scheme, are 
explained in more detail in Appendix B. 

2.3. MBQC as Solution for Long-Distance Communication in 
Nearest- Neighbor Architectures 

The QCLA circuit explained above is one example of a circuit design that assumes 
communication between non-adjacent qubits is allowed. However, scalable quantum 
computers may allow only nearest-neighbor interactions^! The depth complexity 
of a circuit on a Nearest-Neighbor (NN) architecture may be larger than non-NN 
architectures. Under some circumstances, MBQC gives us trade off between depth 

lool 

and space complexity^ 2 * one can reduce the circuit depth by adding a number of 
measurements, entanglements, and byproduct operations in the quantum circuit. 

The out-of-place and in-place QCLA respectively have the overall depth 
\log 2 (n)J + \log 2 {n/i)\ +4 and [log 2 (n)J + Llog(n-l)J + llog 2 (n/3)\ + [log 2 +8. 
However, these abstract quantum circuits assumed unrealistic conditions: interac- 
tions between non-adjacent qubits can be perfectly implemented. When application 
qubits one assigned positions in a quantum computer, some qubits we wish to in- 
teract may be widely separated; examining the circuit diagram for QCLA shows 
many long-distance gates crossing over many other qubits. In a nearest-neighbor 
architecture, we must swap qubits, step by step, until our desired qubits beco me 
neighbors. On a single line, the Q(log 2 (n)) time steps for QCLA expands to 0(n)P^ 

3. Quantum Carry-Lookahead Adder for Measurement-Based 
Quantum Computation 

This section explains the implementation of QCLA for MBQC. The performance 
and requirements for both out-of-place and in-place MBQCLA circuits schemes are 
evaluated. First we describe our metrics for evaluating circuits. The exposition of di- 
rect mapping on both schemes is given, followed by the optimization for the in-place 
circuit. The optimization is done by adjusting the border between the rounds of the 
circuit, then removing the unnecessary lattice sites between the quantum gates, re- 
ducing the communication costs in the circuit. For comparison, a graph state form 
of QCLA is also presented. More detailed results are presented in Appendix C. 
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3.1. Evaluating Algorithms Executed using MBQC 

Logical quantum circuits can be evaluated based on the execution time, or circuit 
depth (usually measured in numbers of Toffoli gates), number of qubits used, and 
total number of gates executed. The number of logical qubits is the number of re- 
quired input qubits plus the number of ancillae required in the circuit representation 
of the algorithm. 

We propose the use of (a) the number of qubits in the cluster state, (b) the 
number of clustering operations, (c) circuit area, and (d) circuit depth as measures 
of performance and cost for algorithms executed using MBQC. The number of 
cluster operations is the number of successful interactions needed. The circuit area 
is the height of the cluster times its width, assuming a regular rectangular lattice. 
All of these measures can be expressed in terms of problem size; in our case, in 
terms of n, the length of each of the logical registers being added. 

The goal of this paper is to minimize the execution time (d), while the other 
three (a-c) are measures of the cost. These costs, as shown in Figure [1] can 
be divided into two categories: first, computational resources, i.e., the num- 
ber of cluster qubits required for Toffoli Phase, CNOT and NOT gates. Second, 
communication resources , i.e., the number of cluster qubits required for SWAP 
gates and wires. A circuit which uses no communication resources is called an 
optimal circuit. As noted above, MBQC requires a measure-adapt-measure cycle 
to implement non-Clifford gates. The execution time is the number of rounds of 
measurement, followed by computation of the adaptive bases for the next round. 



Circuit 




Horizontal 



W 



"H. 

Vertical 
Computational 



Fig. 1. Illustration for circuit costs for MBQCLA. On a two-dimensional Manhattan grid, the 
costs contain computational and communication costs. The resources for communication costs can 
be separated into two types of resources: horizontal (wires) and vertical (SWAP gates). 



In general, the optimal circuit resources can be determined by summing the re- 
sources consumed by the various types of computational gates. Thus, it is expressed 

by 

]T XiKi (11) 

i fz Quantum Gate 
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where 

• Xi = Number of quantum gates of type i 

• IZi — Qubit resources for an i gate 

Because the QCLA is structured in a set of rounds, each of which contains only 
gates of a single type (e.g. Toffoli Phase Gate), we can discuss the cost in those 
terms. The cost of a null circuit would simply be the number of logical qubits 
multiplied by the cost of a horizontal wire. In an actual circuit, we replace some 
sections of horizontal wire with logical gates, and add vertical wires with SWAP 
gates as necessary to implement the logic. Thus, each round in the QCLA, when 
mapped onto the cluster state, is as wide as necessary to accommodate the necessary 
gate typeEl 

By considering the number of SWAP, Toffoli and CNOT gates in the initial 
addition circuit and in P,G and C networks as shown above, we can approximate 
the physical resources needed for an n-qubit out-of-place MBQCLA following this 
expression: 



Size ra 5~](V(n) x B l (n) x % + X t (n) (Hi - £ i X %)) + TZ SWA p X S SW ap (12) 



where i e {Toffoli Phase, CNOT, and NOT Gates}, V(n) = number of logical qubits, 
Bi(n) = number of rounds for the gate of type i, Xi(n) = number of gates of type 
i, 7~i = width of the «-gate (in lattice sites), IZi = number of lattice qubits in an 

1- gate, Si = number of logical qubits in an i-gate (generally, one to three), TZswap 
is number of lattice qubits in a cluster state SWAP gate and Sswap = number of 
SWAP gates in a QCLA circuit, which is dependent on the mapping of the logical 
qubits to positions on the lattice. 

In Equation (fT2]) . '^2 li {V(n) x BiX%) is the cost of a lattice large enough to hold 
all of the circuit rounds that use type z-gates (that is horizontal communication 
costs). — £j x Xi x % is an adjustment for replacing wires with logic gates. 
IZswap x Sswap, which depends on type of rounds in QCLA circuit, is the vertical 
communication in the circuit, which is entirely SWAP gates resources. 

Proposed implementations of cluster state q uantum comp uting in solid-state 
technologies, which need Ising-like Hamiltonian^'24''2^'26'1^, operate on a fixed 

2- D lattice. Hence, they require wires for communication between the rounds, which 
means t hose p roposals will use our optimized circuit. A photonic-based quantum 



compute r^^ will require no wires for communication between the rounds, allowing 
the optimal circuits or graph state form to be used more or less directly. 




b Usually the width of a Toffoli gate. 
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3.2. Out-of-place Measurement- Based Quantum Carry-Lookahead 
Adder 

Our design for a 10-bit form of the out-of-place QCLA on MBQC is shown in 
Figure [2l The input qubits are on the left (top in the rotated figure) and output 
states are on the right. In the figures in this paper, a pink square qubit represents a 
cluster qubit measured in the a x -eigenbasis, a green qubit is for 7r-rotation on a x - 
eigenbasis measurement, a red qubit is for <7 y -eigenbasis measurement and a blue 
qubit is for <r z -eigenbasis measurement. The propagation pattern of one logical qubit 
is highlighted in yellow. Our logical qubits are spaced with a pitch of four lattice 
sites to accommodate the necessary spacing between gates. Each large box outlines 
one round in the P, G or C networks. The circuit is presented in unoptimized form 
for clarity. 

This circuit is essentially a direct mapping of the abstract out-of-place QCLA 
(Figure [T2l in Appendix B) to MBQC. The logical gates used are those described 
in Figure [10] in Appendix A. As noted above, in addition to the computational 
resources, we must add wires and SWAP gates. The long distance gates from the 
abstract out-of-place QCLA (Figure [T2|) are executed using the scheme for non- 
adjacent computation (Figure [TTj) . To completely characterize the circuit, we need 
to know how many SWAP gates and wire segments are added to complete the 
circuit. The exact cost depends on the layout of logical qubits. Below we calculate 
the number of SWAP gates required assuming the data layout of Figure [2] 

The abstract circuit consists of addition and carry computation circuits. For 
adding two n-qubit registers, the addition circuit is built from n Toffoli gates and 
3n-l CNOT gates while the carry computation machinery consists of An-3w(n)- 
3[log2(n)\-l Toffoli gates. The number of Toffoli gates in this circuit can be obtained 
by adding the number of Toffolis in the addition, P, G, and C networks. For the out- 
of-place QCLA circuit, we have n-w(n)-[log2(n)\ Toffoli gates for the P network, 
n-w(n) Toffoli gates for the G network and n- \ log2{n)\-l for C network, where w(n) 
is the Hamming weight of the binary representation of n. Furthermore, the number 
of SWAP gates, which is the vertical communication resources, can be obtained as 
follows: 

• The initial addition round needs 

S Ad = 4n-2w(n)-2[log 2 (n)j (13) 
SWAP gates. For n=10, we need 30 SWAP gates consuming 360 lattice qubits. 

• The propagate network needs 28 SWAP gates for n=10. This number can be 
obtained from the number of Toffoli gates for each round, |/t/2 tp J-l where t p is 
the round number in the propagate network, 1 < t p < [log2(n)\-l .There are 4 
Toffoli gates in the first P round with 16 SWAP gates and 1 Toffoli gate in the 
second P round with 12 SWAP gates. The vertical communication in P networks 
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Fig. 2. Out-of-place MBQCLA. For n=10, the circuit consists of: 4 addition blocks, 9 rounds of 
gates for the carry networks (2 Propagate, 3 Generate, 2 Inverse Propagate and 2 Carry networks). 
For explanation of the colors, see Appendix A. 
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is 

(og 2 (n)-l 

s v = 2n ~ 2 ' p+1 ( 14 ) 

t p = l 

• Similarly, the generate network requires 58 SWAP gates for n=10. The number of 
Toffoli gates for each round is 5n+w(n)+2\log2(n)\. As formulated by |/i/2* 9 J, 
we have three rounds of generate networks; the first round consists of 5 Toffoli 
gates with 12 SWAP gates, the second round needs 2 Toffoli gates with 20 SWAP 
gates and the third round requires 1 Toffoli gate with 26 SWAP gates. The SWAP 
gates resources in the G network can be approximated by 

log 2 (n) 

Sg= J2 i[lo92(n)\+2^ +1 [log 2 (t g )\ (15) 

tg = l 

where t g is the round in the G network. 

• The number of Toffoli gates in the carry network is \_ n ~%tc — J ■ Therein, the first 
Carry round has 4 Toffoli gates with 16 SWAP gates and its second round has 
22 SWAP gates. The resources of SWAP gates in the C network is 

S c = ^-2[log 2 (t c )\ (16) 

t c =i 

where t c is the round number in the C network. 

Following Equation (fl2]). Sswap is the sum of Equations ((15]). (fl4]t. (fl5]) . and ([16]) 

SsWAP = S_Ad + S-p + Sg + Sc • (17) 

For the out-of-place MBQCLA, we see that the depth is reduced to 
|log2(n)J + Llog2(n/3)J +7 compared to «3n for the VBE ripple-carry. However, 
this circuit costs more in physical resources, «901n+224nx [log2(«)J compared to 
«304n for the VBE ripple-carry. The comparison of size and depth between MBQC 
VBE and MBQCLA is shown in Figured 

3.3. In-place Measurement-Based Quantum Carry- Lookahead 
Adder 

The next step is obtaining the performance and requirements of the in-place quan- 
tum carry look-ahead adder. Following the scheme in Ref.ll5|, the erasure (uncompu- 
tation) of the low-order n-l bits of the carry string c requires additional circuitry. 
The algorithm for the in-place form is more complex than out-of-place and uses 
about twice as many Toffoli gates. 

The subsequent procedures, as provided in Appendix B, give in-place MBQCLA 
circuit horizontal resources, as summarized in Table [3] (Appendix C). As shown in 
the previous section, the vertical communication resources can be estimated by 
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Fig. 3. Size and depth comparison between MBQC VBE and MBQCLA. "+","*" , "o" and "□" 
marks arc for in-place, optimized in-place, out-of-place and MBQC VBE circuits, respectively. 

counting the number of SWAP gates in the circuit. Because the in-place circuit 
uses more ancillae, which we interleave with the other qubits, the number of SWAP 
gates for the initial round of half-adders increases to 6n-4w;(n)-4[lognJ-2. The other 
SWAP gate resources can be obtained by examining the Propagate, Generate, and 
Carry networks. In the in-place circuit we need both for computing and uncom- 
puting the carry status, meaning 8n+14[log n\+2 SWAP gates are needed for 
non- adjacent quantum computation. Straightforwardly, the physical resources for 
in-place circuit are 

» 2896n + 64n[log 2 (n)\. (18) 

Also, by the use of Equations (fT3|) (fT4 |) (fT5 |) and (fT6|) for the out-of place circuit, the 
vertical communication resources (SWAP gates), or Sswap for in-place MBQCLA 
circuit is 

Sswap = S Ad + S' Ad + 4Sp + 2Sg + 2S c (19) 

where S' M = Y):—i 2(n — 1), is an additional column required to perform an in-place 
circuit. 
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It is also useful to calculate the optimal in-place MBQCLA circuit resources. 
According to Equation [11] , 

Soptimai — XtpgI^tpg + <%c n otTZ-c n ot + XnotTInot- (20) 

By the use of Table p] , one can obtain 

Soptimai = 162(>(n) + \log 2 {n - 1)J + \log 2 {n)\ - w(n - 1)) + 542 ? i - 395. (21) 

3.4. MBQCLA Latencies 

As discussed in section l2~3l a carry-lookahead addition in MBQC can reach 0(log n) 
time due to the constant scale depth of primitive gates in MBQC. The depth of a 
Toffoli gate in MBQC is 2, and our circuit does not change the original behavior. 
In addition to the Toffoli-dependent rounds, the QCLA requires a small number 
rounds of CNOTs and NOTs, each of which adds one to the circuit depth, giving a 
total of out-of-place and in-place MBQCLA depths are 2([log2(n)\ + \J0g2 J) + 13 
and 2([log 2 {n)\ + [log 2 (n - 1)J + [log 2 ^\ + [log 2 ^\ + 14), respectively. 

3.5. Optimized In-place MBQCLA 

We can optimize the MBQCLA spatial resources in several ways. First, by relaxing 
the Manhattan constraints on the physical geometry, we can use a graph state, 
which requires fewer communication resources if it can be physically implemented. 
The graph state adder will be presented in section 13.61 

In this section, we retain the Manhattan constraint but optimize the circuit. The 
idea of the bent network, imagining that logical qubits are propagated through 
traces on a single-layer two-dimensional surface, is used to reduce the horizontal 
resources of MBQCLA. 

3.5.1. Bent Network in Quantum Carry- Lookahead Adder 

Contrary to the usual quantum circuit assumption that the horizontal axis re- 
lates to logical time, in a "bent" network the temporal axis flows freely in the spatial 
layout. The consequence is that a more compact circuit can be constructed. 

If we apply this bent network method to MBQCLA, we also find that we can 
reduce the horizontal size of the circuit. The bent form of VBE is purely rectangular, 
but MBQCLA is not as regular. The horizontal size for every logical qubit position 
will depend on the number of quantum gates, since it will vary along the register 
as shown in Fig|6] The illustration of a bent network implementation for n=\0 is 
given in FiglT] 

3.5.2. Optimized Circuit Formulation 

To optimize the circuit, we take small groups of qubits, or subregisters, and slide 
them toward the middle of the circuit. As can be seen in FigJTl bending the network 
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can reduce the cluster resources required near qubits do, a±, 02, 0.3, 04, a^, as, and 
ag by the amount: 

n-l 

^(C.AW) (22) 

i=0 

where n = number of logical qubits, Ci— the number of rounds (columns) that 
ith-subregister moves, Ai= number of logical qubits in the ith-subregister (usually 
3, sometimes 4), and W— width of Toffoli Phase Gate. For n=10, this manual 
optimization of horizontal communication results in a reduction of « 12 % for 
spatial resources, or w 3822 qubits. 

3.6. Graph State Quantum Carry-Lookahead Adder ( GSQCLA ) 

In the previous sections, we presented cluster state adders. Here, we present a 
graph state adder. The GSQCLA is simpler and follows more directly from the 
original QCLA definition. For MBQC in graph states, we assume that the restriction 
to Manhattan physical geometry is lifted, and arbitrary entanglement operations 
between qubits are allowed. The vertices follow the graphical notation of MBQC in 
cluster states but with the three additional types of vertices:{measured input qubits 
in an arbitrary angle, — ?, and j} and there are two types of edges : entanglement 
and input/output information flows. We choose CAfOT <4> and CCAfOT <39> for 
running QCLA. When two GSQC quantum gates are concatenated, the output 
qubits of one become the input of the other. The birds eye view of the in-place 
GSQCLA is given in Figure El 

The circuit depth of graph state is identical to that of MBQC, so we focus here 
on the number of qubits and entanglement operations. 

Concatenating quantum gates in graph states can reduce the number of qubits 
whilst the number of entanglement operations is invariant!^. The number of en- 
tanglement operations in GSQCLA is 

E * X * (23) 

k 

Where k e {Toffoli Phase, CNOT, and NOT Gates}, E k = the number of entangle- 
ment operations in type k gates, and X k = the number of gates of type k. 

The number of qubit resources before the removal of unnecessary measurement 
is given by Eq llll After the adjustment, the qubits resources is 

2C, Qadd) (24) 

m I 

Where i e {Toffoli Phase, CNOT, and NOT Gates}, M = number of removed qubits 
oftype/circuitandQp^El^" 1 ^^- 1 )' = E^f^ 3^J, Qc = 



I 



I 
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Yut°=i* 3 ^ 1 ^ L ^ n ~2tl — ~J ) an d Qadd are the number of removed qubits in P, G, C, 
and additional rounds in the circuit, respectively. The formulation of Q a dd varies 
depending on the type of circuit. 

We know from Table |T]) that QCLA requires 10n-3w(n)-3w(n — l)-3\log2(n)]- 
3\log 2 {n — 1) J -7 Toffoli Phase Gates and that the Toffoli Phase Gate uses 39 qubits 
and 43 entangling operations. Therefore, based on the above formulations, the en- 
tanglement operations for the out-of-place GSQCLA is 

224n - 129 (w(n) - [log 2 (n)\ ) - 46. (25) 

The number of qubits for this circuit is 

\log 2 (n)\—l 



201n-117Kn)-L/os2(n)J)-2 £ 2{[^-\-l) 

t„=i 

\log 2 {n)\ [log 2 (^)]-l 



E 3L^J- E 3 L ^^j- 43, (26) 

tg=l t C =l 

or roughly 20 In for large n. This formulation is obtained after concatenating the 
GSQCLA quantum gates and adjusting their qubits resources to form the circuit. 
Similarly, the in-place GSQCLA has 

444n - 129 (w(n) - w{n - 1) - [log 2 {n)\ - [log 2 {n - 1)J) - 318 (27) 

entanglement operations. The number of qubits for the in-place GSQCLA is 

[log 2 (n)\-l 

2(\ 

2 



410n - 117 (w(n)-w(n-l)-[log 2 (n)]-[log 2 (n-l)])- 4 £ 2([^\ - 1) 



2 E 3L^J-2 E 3 L ^^j-261, (28) 

to=l 



about twice the size of the out-of-place version. 



3.7. Resource Comparison 

Figure 2] plots the resources required for the in-place MBQCLA, as derived in Equa- 
tions l[T9"|) . (|2~T]) , and (fTS)) . The red area, which represents the horizontal communi- 
cation costs of MBQCLA, is ~ 77 % of the qubits in the cluster. The light green 
area, showing the costs of MBQCLA circuit vertical communication, consumes ss 
11 %. The cost of the computational circuit shown by the light blue area is « 12 
% of the spatial resources. The light yellow area represents the qubits resources for 
the in-place GSQCLA which costs w 9 % in spatial resources. 
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Resources Comparison 

i i : i i i i 

Horizontal 
Vertical ■■■ 




Logical Qubits (n) 

Fig. 4. Comparison of computational and communication resources in in-place MBQCLA circuit. 
The bottom line on the graph represents the ideal circumstance for the computational resource 
and the other two show the circuit with additional resources for the horizontal and the vertical 
communications . 

4. Conclusion 

In this work, the circuit designs for several forms of a measurement-based quan- 
tum carry-lookahead adder (MBQCLA) and graph-state quantum carry-lookahead 
adder (GSQCLA) are presented. We have shown the resources required to perform 
the quantum carry-lookahead adder in cluster state as a function of the number of 
logical qubits, width of quantum gates, and number of qubits in quantum gates. 
By bending the network and removing the border between the rounds, the opti- 
mization of the in-place MBQCLA circuit changes its shape from a rectangle to a 
diamond-like form. The proposed evaluation methods for the cost and performance 
of application circuits for MBQC will be useful for large scale quantum computer 
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architecture, since application circuits for quantum computers will need optimiza- 
tion similar to that done for classical computer technology. This work has shown the 
value of finding application algorithms that match the strengths of measurement- 
based quantum computation. 
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Fig. 5. In-place MBQCLA. For n=10, the circuit consists of:8 addition circuits, 18 carry networks 
(4 Propagate, 6 Generate, 4 Inverse Propagate and 4 Carry networks) 
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Fig. 6. Optimized in-place circuit. The low n-l bits of the carry string output are tucked into 
the interior of the circuit by bending the network. 
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Fig. 7. The optimized in-place MBQCLA circuit forms a diamond-like circuit. Hand optimization 
of the circuit reduced the size by fs 12 %. 
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Fig. 8. MBQCLA figure allowing the use graph for entanglement and communication. 

Appendix A MBQC Gates and Graphical Notation 

To illustrate MBQC, we detail the operation of the NOT gate, as shown in 
Figure [9] The cluster contains 5 qubits where Ci npu t is qubit 1, C mac hine is qubits 2, 
3 and 4 and C ou t pu t is qubit 5. We begin from the cluster state eigenvalue equations 
for 5 qubits: 



<T x~" > ' 7 ^ l^)c(NOT) — l^)c(NOT) 



rr (l) fT (2) (3) I ,\ 

"z a x °z IWc(NOT) 



fT (l) rr (2) (3) (4) I iv 

a z a z a x a z IWc(NOT) 



/C(NOT) 



/C(NOT) 



^ a z^ l^)c(NOT) ~ l^)c(NOT) 



^ ^ G x^ l^)c(NOT) — l^)c(NOT) 



(29) 
(30) 
(31) 
(32) 
(33) 



I 



l 
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□ □ 



Fig. 9. NOT gate in MBQC. The input qubit and all pink qubits are measured in the a x - 
eigenbasis and the framed-green qubit is measured in an adaptive basis depending on the mea- 
surement outcome of qubit 2. Rotation measurement operator on xy-pl&ne around z-axis with 
angle n on qubit 3 makes the qubit 3 not adaptive. 

After obtaining the quantum correlation of the cluster state, two measurement 
steps arc performed: first, the o~ x measurement on qubit 2 and qubit 4, 



)c(NOT) — ^x^s^x^i l^)c(NOT) 



(34) 



This first measurement converts the initial quantum correlation to the eigenvalue 
equations: 



U X U X U X 



/C(NOT) 



/C(NOT) 



(35) 



/c(not; 



= Mr 



/C(NOT) 



(36) 



(37) 



/C(NOT) — V lf/C(NOT) 

Furthermore, the eigenbasis of r xy ((— 1) S2 (— ?/)).(?, where r xy .a= cos(ry) 
<r x +sin(?/) a y , is chosen as the measurement basis on qubit 3 to realize the operation 
NOT by measurement pattern .M(NOT). Mathematically, it can be expressed by: 



where V 



(3) 



IV-')e(NOT) 

. l + (-l) S 3f fc .ff< fc ) 



V 



xy(ri) l^)c(NOT) 



(38) 



zy r v \— 2 ■ This second measurement generates two eigenvalue 

equations from equation ([35| , (|36| and (|37|) , which obey Theorem 1 of Raussendorf 
et. al: 



aPu^[-r,]a^U^[r,]\i>) 



C(NOT) 



C(NOT) 



(39) 



(40) 



/C(NOT) — ( 1) 2+ 4 l 7 / , )c(NOT) 

By choosing r\ — it, these equations give a NOT-gate. This method can be broadened 
to perform quantum gates on a large-scale cluster state system. Similar to the work 
of Leung^l, quantum computation can be achieved in cluster states depending on 
choice of measurement patterns. 
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Fig. 10. Quantum gates in Measurement-Based Quantum Computation. Due to the relationship 
between CCNOT and Toffoli Phase Gate (TPG), CCNOT=H t {TPG)Hl , the target qubit can be 
chosen arbitrarily by putting a Hadamard gate on the chosen qubit. 




Fig. 11. Non-adjacent computation. The implementation of four type SWAP gates to propagate 
the information up-to-down of non-adjacent qubits for performing CNOT. Qubit \a> acts as the 
control qubit and |b> is the target qubit. 

Generally, every quantum gate contains C; lattice qubits with m measurements, 
C w width, and C'h height. In this paper, we will use Raussendorf et a/.'s quantum 
gates modeffl As shown in Figure [TU] and Figure [TTl CNOT and the Toffoli Phase 
gate can be performed using measurement on 15 and 54 cluster qubits, respectively. 
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Appendix B Out-of-place and In-place Procedures for Abstract 
Quantum Carry-Lookahead Adder 

Here we summarize the QCLA circuits as proposed by Draper et aiP^. The circuit 
for out-of-place addition as shown in Figure [T2] has the form: 

(1) For 0<i<n, Z[i+l]®=A[i\B[i] setting Z[i+l]=g[i, i+l}. 

(2) For 0<i<n, B[i]®=A[i] setting B[i]=p[i, i+l] for i>0 needed to run out-of- 
place addition circuit. 

(3) Run the circuit of the P,G, and C networks. Upon completion, Z[i)=Ci for >1. 

(4) For l<i<n, Z[i]®=B[i]. Now for i>0, Z[i]=a i ®b l ®c l =s i . For i=0, Z[i]=bi. 

(5) Set Z[0]®=A[i]. For l<i<n, B[i]®=A[i]. This fixes Z[0], and resets B to initial 
value. 



The addition circuit for in-place operation has form: 

(1) For 0<i<n, Z[i+l]®=A[i]B[i] setting Z[i+l]=g[i, i+l}. 

(2) For 0<i<n, B[i]®=A[i] setting B[i]=p[i, i+l] for i>0 and B[i]=s . 

(3) Run the circuit of the P,G, and C networks. Upon completion, Z[i]=Ci for >1. 

(4) For l<i<n, B[i]®=Z[i]. Now B[i]=Si. 

(5) For 0<i<n-l, -<B contains s'. 

(6) For l<t<n-l, B{i]®=A[i]. 

(7) Run the P, G, and network in reverse. Upon completion, Z[i+l]=ais'j for 
0<i<n-l, and B=di®s' i for l<i<n. 

(8) For l<i<n-l, B[i]®=A[i]. 

(9) 0<i<n-l, Z[i+l]®=A[i}B[i\. 
(10) 0<i<n-l, ->B. 

The resources for each quantum gates in the abstract in-place QCLA circuit is 
provided in the below table: 



Tabic 1. Logic Gates Resources in Abstract In-Placc QCLA 



Quantum Gate 


Resource 


NOT 


2n-2 


CNOT 


4n-5 


TPG 


10n-3w(n)-3w(n - l)-3\log(n)\-?j\log 2 {n - l)J-7 
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Fig. 12. Abstract out-of-place Quantum Carry-Lookahead Adder for n=10. The blue lines are 
the Propagate and Inverse Propagate networks; the red line is the Generate network and the green 
line is the Carry network. The low n-1 bits of the carry string are not yet erased . 
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Appendix C Requirements and Performance for Out-of-place and 
In-place MBQCLA 



Tabic 2. Requirements and Performance of the out-of-place MBQCLA 



Parameter 


Value 


Pitch 


4 


Variables 

(Logical 

Qubits) 


V(n) = 4n-[log2rtJ+l 


Width 


Width(n)=15 x ( Llog 2 (ra)J + |log 2 (n/3)J )+85 


Height 


Heig/it(n)=4x(4ra-u;(ra)-|log2raJ+l)-3 


Area 


Height(n)xWidth(n) = 
(4x(4n-io(n)-[log 2 nj+l)-3)x 
(14x ( Llog 2 (n)J + |log 2 (n/3)J )+85) 


Number of 
Clustering 
Operations 


(An-w(n)- |log 2 raj +1) X 

(15 x Llog 2 (n)J + Llog 2 (ra/3)J +85)-l) + 

(15x Llog 2 (n)J + Llog 2 (rt/3)J+85))x 

(4n-«>(n)-|log 2 nJ) 


Circuit 
Depth 


Llog 2 (n)J + Llog 2 (n/3)J+7 


Size 

(Number of 
Qubits) 


-3271+899n-419w(n)-377L«o S 2 {n)\+mn\log2(2n/Z)\-\Aw(n)\log 2 {2n/2 l )\ 
+42n [log 2 (n)J +168n [log 2 (n)J-14 \log 2 (2n/3)J [log 2 (n)J -42 [log 2 (n)J 2 

+6x(2n+E^i 2 1 (n " 1) 2(n-2 t p)+E^l 2 1 (n) 2(2(Liofla(n)J)+ (2*« )( [log 2 (t g )\ ) 
+E t L !=i 2 * J 2(n-L«o fl2 (t c )j)) 
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Table 3. Requirements and Performance of the in-place MBQCLA 



Parameter 


Value 


Pitch 


4 


Variables 

(Logical 

Qubits) 


V(n) = An-[log 2 n\+l 


Width 


Width(n)=15x({log 2 n\ + Llog 2 (n-l)J + [log? + Llog 2 )+157 


Height 


Height(n)=Ax (An-w(n)-[log 2 nJ+l)-3 


Area 


Height(n)xWidth{n) = 
(Ax(An-w(n)-[log 2 nj+l)-3)x 

14x(|log 2 nJ + \log 2 (n-l)\ + \log 2 f J + \log 2 )+157 


Number of 
Clustering 
Operations 


(An-w(n)- [log 2 nj +1) X 

(15x(Llog 2 nJ+ \log 2 (n-l)\+ [log 2 % J + |tos 2 ^J)+156)+ 
{lbx(]log 2 n\ + [log 2 {n-l)\+ \log 2 f J + [log 2 ==± J )+157) X 
(&n-w(n)- \ loQ2 n\ ) 


Circuit Depth 


\log 2 n\+ Llog 2 (n-l)J+ \log 2 ^\ + \log 2 ^\+lA 


Size 

(Number of 
Qubits) 


-3068+2896n-138w(n - l)-162w(n)+16Llog 2 ^J 

+64nLlog 2 ^J-146Lio 92 (n-l)J 

+64n [log 2 (n-l)J +16 [log 2 (n/3)J +64n [log 2 (n/3)J 

+21 \log 2 (n) J +64n [log 2 (n)J 

-16[log 2 (n - 1)J |>3 2 (n)J-16|tog 2 f J |to 52 (n)J-16 L«o S 2(n)J 2 + 

6x(4n-l+ Ap°^-V 2(n-2*p)+ 2£<7 = 2 1 ( " ) 2(2( [log 2 (n)j )+ (2*«)([iofl2(t fl )J) 

+ 2ElS ¥J 2(n-LW< c j)) 
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